Week 12 of 12 · Part C — Governance

Scoping a Safety Program

The capstone begins where every real engagement begins — by deciding exactly what you're evaluating, and what "safe enough" means

Day 56 ~60 minutes Concept

Day 56 of 60

The capstone, and why it's one document

For eleven weeks you've built artifacts: a threat model, a taxonomy and policy, a red-team plan, an eval suite, a robustness report, an alignment and interpretability brief, a risk register, a governance gap list. Each was real and each stood alone. This week you do the thing that actually makes someone a safety lead instead of a collection of skills: you assemble them into one coherent safety evaluation program — a binder a review board could read and act on.

The thesis

A pile of strong artifacts is not a program. A program is artifacts tied to a single deployment, a single definition of "safe enough," and a single recommendation. The integration — making them point the same direction and tell one story — is the work this week, and it's the work that gets hired.

And like every real engagement, it starts not with testing but with scoping. Before you evaluate anything you have to say precisely what you're evaluating, who could be harmed, and what bar the system must clear to ship. Get the scope wrong and every downstream artifact measures the wrong thing beautifully.

What a scope actually pins down

Core Theory

1 · The subject — what, exactly, is deployed

Not "a model" but a deployment: which model, in which product, with which capabilities and access. "Model X as a tool-using assistant inside a customer-support app, with retrieval over our docs" is a scope. "Is the model safe?" is not. Capabilities and access (tools, browsing, memory, code execution) are part of the subject because they're where most real risk lives.

2 · The safety goal — what "safe enough" means here

One deployment's acceptable risk is another's red line. Write the goal as a falsifiable bar: harmful-compliance rate below a threshold, no untested high-severity attack class, over-refusal under a ceiling. The goal is what your evals later either clear or don't — so it has to be decided before you see results, not after.

3 · The assets and stakeholders — who's harmed and who signs off

Reuse your Week 1 threat model: who and what are you protecting, and who owns the go/no-go? A scope names the review board, the decision date, and the person accountable for the call. A program with no named owner is a document, not a decision.

Reuse, don't rebuild

You already have a threat model from Week 1. Don't write a new one — instantiate it for this specific deployment. Scoping is mostly the act of taking general artifacts and binding them to one concrete system, with one timeline and one decision-maker.

Map your artifacts to the program before you write a word

The fastest way to scope is to lay out the binder's table of contents and write, next to each section, which artifact fills it. This turns the capstone from "write a huge document" into "assemble eight things I already built." It also surfaces gaps early: if a section has no artifact behind it, that's the work the rest of the week closes.

The skeleton you're assembling

Threat model (W1) → taxonomy + policy (W2) → red-team plan (W3) → eval suite (W4) → robustness report (W5) → alignment + interpretability note (W7–8) → risk register (W10) → governance gap list (W11) → recommendation. Tomorrow you start bolting them together; today you decide what they're all in service of.

Your work today

Scope the Program

~60 minutes

  1. Choose one realistic deployment you'll evaluate for the whole capstone — ideally the same system you've been using since your Week 1 threat model. Write the subject in one sentence, including capabilities and access.
  2. Write the safety goal as a falsifiable bar: the specific rates and red lines that mean "safe enough to ship." Decide it now, before any results exist.
  3. Instantiate your Week 1 threat model for this deployment — assets, top-5 risks, who's harmed — rather than starting fresh.
  4. Lay out the binder's table of contents and write, next to each section, exactly which prior-week artifact fills it. Circle any section with no artifact behind it.
  5. Name the review board, the accountable owner, and a realistic go/no-go decision date.
The expert move

A junior practitioner opens a safety engagement by testing. An expert opens by scoping — fixing the subject, the bar, and the decision-maker first — because a test against an undefined bar can't pass or fail, only flatter. The altitude jump is from "I ran evaluations" to "I ran a program against a pre-registered definition of safe enough, with a named owner and a decision date."

Say this in an interview: "Before I evaluate anything, I scope: which deployment, with which capabilities, against what falsifiable bar, owned by whom, decided by when. Pre-registering 'safe enough' is what keeps the eval honest — otherwise you're just grading the model on a curve you drew after seeing its answers."

Today's Takeaways